AITopics | evidence sentence

Collaborating Authors

evidence sentence

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Can LLMs Estimate Cognitive Complexity of Reading Comprehension Items?

Hwang, Seonjeong, Kim, Hyounghun, Lee, Gary Geunbae

arXiv.org Artificial IntelligenceOct-30-2025

Estimating the cognitive complexity of reading comprehension (RC) items is crucial for assessing item difficulty before it is administered to learners. Unlike syntactic and semantic features, such as passage length or semantic similarity between options, cognitive features that arise during answer reasoning are not readily extractable using existing NLP tools and have traditionally relied on human annotation. In this study, we examine whether large language models (LLMs) can estimate the cognitive complexity of RC items by focusing on two dimensions-Evidence Scope and Transformation Level-that indicate the degree of cognitive burden involved in reasoning about the answer. Our experimental results demonstrate that LLMs can approximate the cognitive complexity of items, indicating their potential as tools for prior difficulty analysis. Further analysis reveals a gap between LLMs' reasoning ability and their metacognitive awareness: even when they produce correct answers, they sometimes fail to correctly identify the features underlying their own reasoning process.

cognitive complexity, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.25064

Genre: Research Report > New Finding (1.00)

Industry:

Education > Educational Setting (0.94)
Education > Assessment & Standards > Student Performance (0.71)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Improving the fact-checking performance of language models by relying on their entailment ability

Kumar, Gaurav, Mazumder, Debajyoti, Garg, Ayush, Patro, Jasabanta

arXiv.org Artificial IntelligenceOct-22-2025

Automated fact-checking has been a challenging task for the research community. Past works tried various strategies, such as end-to-end training, retrieval-augmented generation, and prompt engineering, to build robust fact-checking systems. However, their accuracy has not been very high for real-world deployment. We, on the other hand, propose a simple yet effective strategy, where entailed justifications generated by LLMs are used to train encoder-only language models (ELMs) for fact-checking. We conducted a rigorous set of experiments, comparing our approach with recent works and various prompting and fine-tuning strategies to demonstrate the superiority of our approach. Additionally, we did quality analysis of model explanations, ablation studies, and error analysis to provide a comprehensive understanding of our approach.

justification, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2505.1505

Country:

Asia (1.00)
Europe (0.92)
North America > United States > Wisconsin (0.14)

Genre:

Workflow (0.92)
Research Report > New Finding (0.67)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
Health & Medicine (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How Grounded is Wikipedia? A Study on Structured Evidential Support and Retrieval

Walden, William, Ricci, Kathryn, Wanner, Miriam, Jiang, Zhengping, May, Chandler, Zhou, Rongkun, Van Durme, Benjamin

arXiv.org Artificial IntelligenceOct-10-2025

Wikipedia is a critical resource for modern NLP, serving as a rich repository of up-to-date and citation-backed information on a wide variety of subjects. The reliability of Wikipedia -- its groundedness in its cited sources -- is vital to this purpose. This work analyzes both how grounded Wikipedia is and how readily fine-grained grounding evidence can be retrieved. To this end, we introduce PeopleProfiles -- a large-scale, multi-level dataset of claim support annotations on biographical Wikipedia articles. We show that: (1) ~22% of claims in Wikipedia lead sections are unsupported by the article body; (2) ~30% of claims in the article body are unsupported by their publicly accessible sources; and (3) real-world Wikipedia citation practices often differ from documented standards. Finally, we show that complex evidence retrieval remains a challenge -- even for recent reasoning rerankers.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2506.12637

Country:

North America > United States (1.00)
Europe (0.67)

Genre: Research Report (0.50)

Industry:

Health & Medicine (0.46)
Media > Film (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Utilizing LLMs to Investigate the Disputed Role of Evidence in Electronic Cigarette Health Policy Formation in Australia and the UK

Curran, Damian, Chapman, Brian, Conway, Mike

arXiv.org Artificial IntelligenceMay-13-2025

Australia and the UK have developed contrasting approaches to the regulation of electronic cigarettes, with - broadly speaking - Australia adopting a relatively restrictive approach and the UK adopting a more permissive approach. Notably, these divergent policies were developed from the same broad evidence base. In this paper, to investigate differences in how the two jurisdictions manage and present evidence, we developed and evaluated a Large Language Model-based sentence classifier to perform automated analyses of electronic cigarette-related policy documents drawn from official Australian and UK legislative processes (109 documents in total). Specifically, we utilized GPT-4 to automatically classify sentences based on whether they contained claims that e-cigarettes were broadly helpful or harmful for public health. Our LLM-based classifier achieved an F-score of 0.9. Further, when applying the classifier to our entire sentence-level corpus, we found that Australian legislative documents show a much higher proportion of harmful statements, and a lower proportion of helpful statements compared to the expected values, with the opposite holding for the UK. In conclusion, this work utilized an LLM-based approach to provide evidence to support the contention that - drawing on the same evidence base - Australian ENDS-related policy documents emphasize the harms associated with ENDS products and UK policy documents emphasize the benefits. Further, our approach provides a starting point for using LLM-based methods to investigate the complex relationship between evidence and health policy formation.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2505.06782

Country: Oceania > Australia > Victoria (0.15)

Genre: Research Report > New Finding (0.47)

Industry:

Health & Medicine > Public Health (1.00)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

CDER: Collaborative Evidence Retrieval for Document-level Relation Extraction

Tran, Khai Phan, Li, Xue

arXiv.org Artificial IntelligenceApr-10-2025

Document-level Relation Extraction (DocRE) involves identifying relations between entities across multiple sentences in a document. Evidence sentences, crucial for precise entity pair relationships identification, enhance focus on essential text segments, improving DocRE performance. However, existing evidence retrieval systems often overlook the collaborative nature among semantically similar entity pairs in the same document, hindering the effectiveness of the evidence retrieval task. To address this, we propose a novel evidence retrieval framework, namely CDER. CDER employs an attentional graph-based architecture to capture collaborative patterns and incorporates a dynamic sub-structure for additional robustness in evidence retrieval. Experimental results on the benchmark DocRE dataset show that CDER not only excels in the evidence retrieval task but also enhances overall performance of existing DocRE system.

artificial intelligence, entity pair, natural language, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-981-97-4982-9_3

2504.06529

Country:

Oceania > Australia (0.28)
North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence

Fayyaz, Mohsen, Modarressi, Ali, Schuetze, Hinrich, Peng, Nanyun

arXiv.org Artificial IntelligenceMar-6-2025

Dense retrieval models are commonly used in Information Retrieval (IR) applications, such as Retrieval-Augmented Generation (RAG). Since they often serve as the first step in these systems, their robustness is critical to avoid failures. In this work, by repurposing a relation extraction dataset (e.g. Re-DocRED), we design controlled experiments to quantify the impact of heuristic biases, such as favoring shorter documents, in retrievers like Dragon+ and Contriever. Our findings reveal significant vulnerabilities: retrievers often rely on superficial patterns like over-prioritizing document beginnings, shorter documents, repeated entities, and literal matches. Additionally, they tend to overlook whether the document contains the query's answer, lacking deep semantic understanding. Notably, when multiple biases combine, models exhibit catastrophic performance degradation, selecting the answer-containing document in less than 3% of cases over a biased document without the answer. Furthermore, we show that these biases have direct consequences for downstream applications like RAG, where retrieval-preferred documents can mislead LLMs, resulting in a 34% performance drop than not providing any documents at all.

computational linguistic, retrieval, retriever, (14 more...)

arXiv.org Artificial Intelligence

2503.05037

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Africa > Nigeria (0.05)
North America > United States > New York > New York County > New York City (0.04)
(21 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.88)

Industry: Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Say Less, Mean More: Leveraging Pragmatics in Retrieval-Augmented Generation

Riaz, Haris, Riloff, Ellen, Surdeanu, Mihai

arXiv.org Artificial IntelligenceFeb-27-2025

We propose a simple, unsupervised method that injects pragmatic principles in retrieval-augmented generation (RAG) frameworks such as Dense Passage Retrieval to enhance the utility of retrieved contexts. Our approach first identifies which sentences in a pool of documents retrieved by RAG are most relevant to the question at hand, cover all the topics addressed in the input question and no more, and then highlights these sentences within their context, before they are provided to the LLM, without truncating or altering the context in any other way. We show that this simple idea brings consistent improvements in experiments on three question answering tasks (ARC-Challenge, PubHealth and PopQA) using five different LLMs. It notably enhances relative accuracy by up to 19.7% on PubHealth and 10% on ARC-Challenge compared to a conventional RAG system.

evidence sentence, query, reasoning, (15 more...)

arXiv.org Artificial Intelligence

2502.17839

Country:

North America > United States > Arizona > Pima County > Tucson (0.14)
Asia > Japan (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (1.00)
Education (0.68)
Leisure & Entertainment (0.67)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

SelfElicit: Your Language Model Secretly Knows Where is the Relevant Evidence

Liu, Zhining, Amjad, Rana Ali, Adkathimar, Ravinarayana, Wei, Tianxin, Tong, Hanghang

arXiv.org Artificial IntelligenceFeb-12-2025

Providing Language Models (LMs) with relevant evidence in the context (either via retrieval or user-provided) can significantly improve their ability to provide factually correct grounded responses. However, recent studies have found that LMs often struggle to fully comprehend and utilize key evidence from the context, especially when it contains noise and irrelevant information - an issue common in real-world scenarios. To address this, we propose SelfElicit, an inference-time approach that helps LMs focus on key contextual evidence through self-guided explicit highlighting. By leveraging the inherent evidence-finding capabilities of LMs using the attention scores of deeper layers, our method automatically identifies and emphasizes key evidence within the input context, facilitating more accurate and factually grounded responses without additional training or iterative prompting. We demonstrate that SelfElicit brings consistent and significant improvement on multiple evidence-based QA tasks for various LM families while maintaining computational efficiency. Our code and documentation are available at https://github.com/ZhiningLiu1998/SelfElicit.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.08767

Country:

Oceania > Australia > South Australia (0.15)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Austria > Vienna (0.14)
(29 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Sports > Soccer (1.00)
Media > Music (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)

Add feedback

Filters

Collaborating Authors

evidence sentence

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

0a245311a23460d1846043d4156445d6-Supplemental-Conference.pdf

0a245311a23460d1846043d4156445d6-Supplemental-Conference.pdf

Can LLMs Estimate Cognitive Complexity of Reading Comprehension Items?

Improving the fact-checking performance of language models by relying on their entailment ability

How Grounded is Wikipedia? A Study on Structured Evidential Support and Retrieval

Utilizing LLMs to Investigate the Disputed Role of Evidence in Electronic Cigarette Health Policy Formation in Australia and the UK

CDER: Collaborative Evidence Retrieval for Document-level Relation Extraction

Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence

Say Less, Mean More: Leveraging Pragmatics in Retrieval-Augmented Generation

SelfElicit: Your Language Model Secretly Knows Where is the Relevant Evidence